WHAT IS CLAIMED IS: 



A method for analyzing a plurality of transcript sequences in a cluster comprising: 

aligning the transcript sequences with genomic sequences; and 

determining whether the clusters need to be modified according to the aligning. 

The method of Claim 1 wherein the step of determining comprises classifying a 
cluster as a chimeric cluster if the cluster is aligned to two separate locations in the 
genomic sequence. 

The method of Claim 2 wherein the chimeric cluster has at least 5% of its sequences 
aligned to each of the two separate locations. 

The method of Claim 3 wherein the chimeric cluster has at least 10% of its 
sequences aligned to each of the two separate locations. 

The method of Claim 4 wherein the chimeric cluster has at least 20% of its 
sequences aligned to each of the two separate locations. 

The method of Claim 5 wherein the chimeric cluster has at least 30% of its 
sequences aligned to each of the two separate locations. 

The method of Claims 4 or 5 further comprising subclustering the chimeric clusters; 
realigning subclusters to the genomic sequence; and analyzing the re-aligning to 
determine chimeric clusters. 

The method of Claim 7 wherein the process is repeated until no chimeric cluster is 
detected. 
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9. The method of Claim 1 wherein the step of determining comprises detecting clusters 
with consensus which overlap in genomic space. 

1 0. The method of Claim 9 further comprising merging the clusters with consensus 
which overlap in genomic space. 

1 1 . The method of Claim 1 wherein the step of determining comprises detecting clusters 
with consensus within 1000 bases and on the same strand. 

12. The method of Claim 1 1 further comprising merging the clusters with consensus 
within 1000 bases and on the same strand. 

13. A method for triming a transcript sequence comprising: aligning the transcript 
sequence to its corresponding genomic sequence; removing a side sequence of the 
transcript sequence if the side is poorly aligned with the genomic sequence. 

14. The method of Claim 13 wherein the transcript sequence aligns with the genomic 
sequence with at least 80% identity. 

15. The method of Claim 14 wherein the transcript sequence aligns with the genomic 
sequence with at least 90% identity. 

16. A method of designing a nucleic acid probe array comprising: 

aligning a plurality of transcript sequences in a cluster to their corresponding 
genomic sequence; 

modifying the clusters according to their aligning to the genomic sequence to obtain 
at least one modified cluster; and 

selecting probes targeting the at least one modified cluster. 
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17. The method of Claim 16 wherein the step of modifying comprises subclustering 
chimeric clusters. 

1 8. The method of Claim 17 wherein a cluster is classified as a chimeric cluster if the 
cluster is aligned to two separate locations in the genomic sequence. 

19. The method of Claim 18 wherein the chimeric cluster has at least 5% of its 
sequences aligned to each of the two separate locations. 

20. The method of Claim 19 wherein the chimeric cluster has at least 10% of its 
sequences aligned to each of the two separate locations. 

2 1 . The method of Claim 20 wherein the chimeric cluster has at least 20% of its 
sequences aligned to each of the two separate locations. 

22. The method of Claim 2 1 wherein the chimeric cluster has at least 30% of its 
sequences aligned to each of the two separate locations. 

23. The method of Claim 16 wherein the step of modifying comprises merging the 
clusters with consensus which overlap in genomic space. 

24. The method of Claims 1 6 further comprising merging the clusters with consensus 
within 1000 bases and on the same strand. 

25 . A method of designing a nucleic acid probe array comprising: 
aligning a transcript sequence to its corresponding genomic sequence; 

triming a side of the transcript sequence to obtain a trimmed transcript sequence if 
the side of the transcript sequence is poorly align with the genomic sequence; and 

selecting probes targeting the trimmed transcript sequence or clusters including the 
trimmed transcript sequence. 
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26. A computer readable medium comprising computer-executable instructions for 
performing the method comprising: 

aligning transcript sequences from a cluster with genomic sequences; and 
determining whether the clusters need to be modified according to the aligning. 

27. The computer readable medium of Claim 26 wherein the step of determining 
comprises classifying a cluster as a chimeric cluster if the cluster is aligned to two 
separate locations in the genomic sequence, 

28. The computer readable medium of Claim 27 wherein the chimeric cluster has at 
least 5% of its sequences aligned to each of the two separate locations. 

29. The computer readable medium of Claim 28 wherein the chimeric cluster has at 
least 10% of its sequences aligned to each of the two separate locations. 

30. The computer readable medium of Claim 29 wherein the chimeric cluster has at 
least 20% of its sequences aligned to each of the two separate locations. 

3 1 . The computer readable medium of Claim 30 wherein the chimeric cluster has at 
least 30% of its sequences aligned to each of the two separate locations. 

32. The computer readable medium of Claims 29, 30 or 3 1 further comprising 
subclustering the chimeric clusters; realigning subclusters to the genomic sequence; 
and analyzing the re-aligning to determine chimeric clusters. 

33. The computer readable medium of Claim 32 wherein the process is repeated until no 
chimeric cluster is detected. 

34. The computer readable medium of Claim 33 wherein the step of determining 
comprises detecting clusters with a consensus that overlaps in the genomic space. 
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35. 



The computer readable medium of Claim 34 further comprising merging the clusters 
with consensus which overlap in genomic space. 



36. The computer readable medium of Claim 25 wherein the step of determining 

comprises detecting clusters with consensus within 1000 bases and on the same 
strand. 



37. The computer readable medium of Claim 36 further comprising merging the clusters 
with consensus within 1000 bases and on the same strand. 



38. A computer readable medium comprising computer-executable instructions for 
performing the method comprising: aligning a transcript sequence to its 
corresponding genomic sequence; removing a side sequence of the transcript 
sequence if the side is poorly aligned with the genomic sequence. 

39. The computer readable medium of Claim 38 wherein the transcript sequence aligns 
with the genomic sequence with at least 80% identity. 

40. The computer readable medium of Claim 39 wherein the transcript sequence aligns 
with the genomic sequence with at least 90% identity. 



41. A computer readable medium comprising computer-executable instructions for 
performing the method comprising: 

aligning a plurality of transcript sequences in a cluster to their corresponding 
genomic sequence; 

modifying the cluster according to their aligning to the genomic sequence to obtain at 
least one modified cluster; and 

selecting probes targeting the at least one modified cluster. 
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42. 



The computer readable medium of Claim 42 wherein the step of modifying 
comprises subclustering a chimeric cluster. 



43. The computer readable medium of Claim 42 wherein the a cluster is classified as a 
chimeric cluster if the cluster is aligned to two separate locations in the genomic 
sequence. 

44. The computer readable medium of Claim 43 wherein the chimeric cluster has at 
least 5% of its sequences aligned to each of the two separate locations. 

45. The computer readable medium of Claim 44 wherein the chimeric cluster has at 
least 10% of its sequences aligned to each of the two separate locations. 

46. The computer readable medium of Claim 45 wherein the chimeric cluster has at 
least 20% of its sequences aligned to each of the two separate locations. 

47. The computer readable medium of Claim 46 wherein the chimeric cluster has at 
least 30% of its sequences aligned to each of the two separate locations. 

48. The computer readable medium of Claim 47 wherein the step of modifying 
comprises merging the clusters with consensus which overlap in genomic space. 

49. The computer readable medium of Claims 48 further comprising merging the 
clusters with consensus within 1000 bases and on the same strand. 

50. A computer readable medium comprising computer-executable instructions for 
performing the method of 

aligning a transcript sequence to its corresponding genomic sequence; 
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triming a side of the transcript sequence to obtain a trimmed transcript sequence if 
the side of the transcript sequence is poorly align with the genomic sequence; and 

selecting probes targeting the trimmed transcript sequence or clusters including the 
trimmed transcript sequence. 
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